20th March,2024
                                                                                             EDA created by Jahanavi Desai
                                                                                             UBC - Data Visualization in Python

Exploratory Data Analysis:

Trees Data of Vancouver

Foreword

This notebook will be showing some exploratory data analysis for the subset of Vancouver Street Trees dataset located here. Here I am analyzing the Vancouver Street Trees dataset. The data were obtained from The city of Vancouver's Open Data Portal and follows an Open Government License – Vancouver here.

Motivation:

Vancouver is known for its beauty of mountains,trees, oceans, lakes and its beautiful scenic views.The greenary in neighbourhood makes me just fall in love with nature. It makes me wonder that with the growing economical growth how has the tree plantation changed over years? Does it affect the number of trees around us? By looking at the statical data on city of vancouver website here its fascinating seeing all these changes in different neighbourhood and its trees plantation.Are the some species of trees are planted more than others? Whats the relation between the diameter and the height? We will be able to address these questions using an interactive dashboard.

Question(s) of interests

In this analysis, I will be investigating a question associated with the collection of Vancouver Street Trees datasets. I am interested in finding out: 1) Which one of the species has the highest diameter and height range in Kerisdale neighbourhood, i am curious to know as i live in this neighbour, i want to see which species are these trees that i see daily? 2) How useful is the map representation with the interactive scatter point plot with Diameter and Height for each Neighbourhood? 3) Number of trees planted in each Neighbourhood by every planting Area? 4) Number of trees on the even side and odd side of the street? 5) Make a interactive Dashboard, is it possible with all interactivity?

Dataset description

The below descriptions were taken directly from the website,The city of Vancouver's Open Data Portal and follows an Open Government License – Vancouver,where the datasets were obtained.

The dataset contains street tree data from Vancouver, including information such as tree species, diameter, location, and age. Let's begin by understanding the structure of the dataset and exploring the columns of interest:

The data that we will be using through the subset data from this URL here

Our Data Scheme:

Column Description
tree_id Tree's unique ID
civic_number Street address at which the tree is located
std_street Street name at which the tree is located
genus_name Genus name of the tree
species_name Street name of the tree
cultivar_name cultivar name of the tree
common_name common name of the tree
assigned Indicates whether the address is made up to associate the tree with a nearby lot (Y=Yes or N=No)
root_barrier Root barrier installed (Y = Yes, N = No)
plant_area B = behind sidewalk, C = cutout, G = in tree grate, L = lane, N = no sidewalk, P = park. Numeric value indicates boulevard width in feet
on_street_block The street block at which the tree is physically located on
on_street The name of the street at which the tree is physically located on
neighbourhood_name City's defined local area in which the tree is located.
street_side_name The street side which the tree is physically located on (Even, Odd or Median (Med))
height_range_id 0-10 for every 10 feet (e.g., 0 = 0-10 ft, 1 = 10-20 ft, 2 = 20-30 ft, and 10 = 100+ ft)
height_range Height range of the tree measured in feet
diameter DBH in inches (DBH stands for diameter of tree at breast height)
curb Curb presence (Y = Yes, N = No)
date_planted date of the tree planted YYYY-MM-DD format
latitude latitude of the tree is located
longitude longitude of the tree is located

Information about the Data:

Data Wrangling and learning about the Data

Lets see what the tables look like.

Lets get some other information about the trees dataframe table.

The sets table has $579$ rows and $6$ columns.

We have null values in three columns which are 'date_planted','plant_area' and 'cultivar_name'

There are certain columns which has null values and are of no use such as cultivar_name,on_street

Lets utilize the plant area column to get the best insights, first lets deal with the missing values and the case of the values

We have to deal with the null values in the data column: As we know - Date is in YYYY-MM-DD format. Planted date of new trees is added after every planting season, usually at the beginning of January and June.

In this dataframe i think we have certain columns that we dont need it for our analysis. We can drop certain columns or we should just make a child dataframe from this trees df with the columns we need.

Now we have a pretty clean dataframe to work on with no null values and with only the columns that are of use.

For our first visualization, let's get the answer for our first question:

1) Which one of the top 5 species has the highest diameter and height range in Kerisdale neighbourhood?

Lets make a interactive legend to get the height and diameter of the respective Neighbourhood:

Its hard to see which species has the highest height or diameter overall in each neighbourhood.

First we will get all the unique values of the neighbourhood Columns:

we will make a dropdown selection for all the neighbourhoods we have.

Here we will make a dataframe object having top 5 species in the neighbourhood:

we will add the species into our combined interactive chart so that we can select which species out of the top 5 has the highest diameter and height:

Adding the color to the output points and adding the selection for species and neighbourhood to the city_radio_plot

Here is a postive relation between Diameter of a tree and its height range

Answer :Here its clear that Plantanoides species has the highest height_range and diameter in Kerrisdale by 5 and 36.5 respectively. We can find more insights from the given interactive chart which is the least diameter in each species and neighbourhood having h.

2) How useful is the map representation with the interactive scatter point plot with Diameter and Height for each Neighbourhood?

For adding the vancouver map to the plot, first we will load the data from this URL and then store it in an dataframe variable

Making is a base map is essential, so that we can directly add the specification from our new_trees_df later

making a map with all the neighbourhood data we have in our new_trees_df, so we will use the key which is neighbourhood name and match it in both dataframe object which gives us a beautiful vancouver map.

I want to set the title of the plot in the center

Answer :This chart is visually appealing and it is giving information with interactivity such as diameter and height range. But widget chart is prefered to get this information or even a color legend.

3)Number of trees planted in each Neighbourhood by every planting Area?

making a selection for the neighbourhood and then making a bar chart for representation

We have plant area and we have values in both categorical and numerical, we can utilize this plot to see which details do we have per species per neighbourhood

Answer :This graph is very user friendly with the interactivity it has, it seems easy to navigate and check for the number of trees planted and its area type.

4) Number of trees on the even side and odd side of the street

we will just make a object that will use instead of the whole dataframe, which includes the count of each street side name :

For a better comparison of the number of trees planted on each side , circle plot works the best

Here its pretty clear that we have highest number of trees planted on the odd side of the street which is 2554.

Creating a dashboard which includes the four plots out of which 2 of them are interactive with each other.

Conclusion:

In this EDA i have got the best information out for the trees in Vancouver. I feel the number of trees per year could have a been a great way to see the change over the years in plantation and what we need to do to make it consistent ahead. Apart from these, i have got the top 5 species of trees we have in the city and how their height and diameters correlates in a postive direction which leads to make a point that more the diameter of a tree is, chances of it to grow taller is higher. Still it varies in different neighbourhood and for different species. Furthermore, we have got the interactive bar chart and a scatter plot showing the respective Area the tree is planted per neighbourhood and the total number of trees planted. Making the dashboard is the fun part, adding all four charts together, making it interact with eachother, the 2 widgets in one chart and click which works for selection in the map and the bar chart.

Resources used